AITopics | action preference

Collaborating Authors

action preference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents

Yang, Qinglong, Li, Haoming, Zhao, Haotian, Yan, Xiaokai, Ding, Jingtao, Xu, Fengli, Li, Yong

arXiv.org Artificial IntelligenceJul-30-2025

Mobile GUI agents are becoming critical tools for enhancing human-device interaction efficiency, with multimodal large language models (MLLMs) emerging as dominant paradigms in this domain. Current agents, however, are limited to following explicit human instructions, resulting in insufficient capability for proactive intent anticipation. Additionally, these agents fail to leverage the contextual information associated with users during task execution, thereby neglecting potentially vast differences in user preferences. To address these challenges, we introduce the FingerTip benchmark. It contains two new tracks: proactive task suggestions by analyzing environment observation and users' previous intents, and personalized task execution by catering to users' action preferences. We collected unique human demonstrations of multi-step Android device interactions across a variety of everyday apps. These demonstrations are not isolated but are continuously acquired from the users' long-term usage in their real lives, and encompass essential user-related contextual information. Our experiments reveal challenges of the tasks we propose. The model fine-tuned with the data we collected effectively utilized user information and achieved good results, highlighting the potential of our approach in building more user-oriented mobile GUI agents. Our code is open-source at https://anonymous.4open.science/r/FingerTip-57B8 for reproducibility.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2507.21071

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology (0.54)
Telecommunications (0.34)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Value Based Parallel Update MCTS Method for Multi-Agent Cooperative Decision Making of Connected and Automated Vehicles

Han, Ye, Zhang, Lijun, Meng, Dejian, Hu, Xingyu, Weng, Songyu

arXiv.org Artificial IntelligenceSep-19-2024

To solve the problem of lateral and logitudinal joint decision-making of multi-vehicle cooperative driving for connected and automated vehicles (CAVs), this paper proposes a Monte Carlo tree search (MCTS) method with parallel update for multi-agent Markov game with limited horizon and time discounted setting. By analyzing the parallel actions in the multi-vehicle joint action space in the partial-steady-state traffic flow, the parallel update method can quickly exclude potential dangerous actions, thereby increasing the search depth without sacrificing the search breadth. The proposed method is tested in a large number of randomly generated traffic flow. The experiment results show that the algorithm has good robustness and better performance than the SOTA reinforcement learning algorithms and heuristic methods. The vehicle driving strategy using the proposed algorithm shows rationality beyond human drivers, and has advantages in traffic efficiency and safety in the coordinating zone.

algorithm, node, vehicle, (17 more...)

arXiv.org Artificial Intelligence

2409.13783

Country:

North America > United States > Texas (0.04)
Europe > Germany > Berlin (0.04)
Europe > Netherlands (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Games (1.00)
Consumer Products & Services > Travel (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Multi-Robot Communication-Aware Cooperative Belief Space Planning with Inconsistent Beliefs: An Action-Consistent Approach

Kundu, Tanmoy, Rafaeli, Moshe, Indelman, Vadim

arXiv.org Artificial IntelligenceMar-9-2024

Multi-robot belief space planning (MR-BSP) is essential for reliable and safe autonomy. While planning, each robot maintains a belief over the state of the environment and reasons how the belief would evolve in the future for different candidate actions. Yet, existing MR-BSP works have a common assumption that the beliefs of different robots are consistent at planning time. Such an assumption is often highly unrealistic, as it requires prohibitively extensive and frequent communication capabilities. In practice, each robot may have a different belief about the state of the environment. Crucially, when the beliefs of different robots are inconsistent, state-of-the-art MR-BSP approaches could result in a lack of coordination between the robots, and in general, could yield dangerous, unsafe and sub-optimal decisions. In this paper, we tackle this crucial gap. We develop a novel decentralized algorithm that is guaranteed to find a consistent joint action. For a given robot, our algorithm reasons for action preferences about 1) its local information, 2) what it perceives about the reasoning of the other robot, and 3) what it perceives about the reasoning of itself perceived by the other robot. This algorithm finds a consistent joint action whenever these steps yield the same best joint action obtained by reasoning about action preferences; otherwise, it self-triggers communication between the robots. Experimental results show efficacy of our algorithm in comparison with two baseline algorithms.

inconsistent belief, joint action, robot, (15 more...)

arXiv.org Artificial Intelligence

2403.05962

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Boosting Offline Reinforcement Learning with Action Preference Query

Yang, Qisen, Wang, Shenzhi, Lin, Matthieu Gaetan, Song, Shiji, Huang, Gao

arXiv.org Artificial IntelligenceJun-5-2023

Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance and interaction costs. In particular, online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase. However, even limited online interactions can be inaccessible or catastrophic for high-stake scenarios like healthcare and autonomous driving. In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP). The main insight is that, compared to online fine-tuning, querying the preferences between pre-collected and learned actions can be equally or even more helpful to the erroneous estimate problem. By adaptively encouraging or suppressing policy constraint according to action preferences, OAP could distinguish overestimation from beneficial policy improvement and thus attains a more accurate evaluation of unseen data. Theoretically, we prove a lower bound of the behavior policy's performance improvement brought by OAP. Moreover, comprehensive experiments on the D4RL benchmark and state-of-the-art algorithms demonstrate that OAP yields higher (29% on average) scores, especially on challenging AntMaze tasks (98% higher).

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2306.03362

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Genre: Instructional Material (1.00)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Alternate Policy Gradient Estimator for Softmax Policies

Garg, Shivam, Tosatto, Samuele, Pan, Yangchen, White, Martha, Mahmood, A. Rupam

arXiv.org Artificial IntelligenceDec-21-2021

Policy gradient (PG) estimators for softmax policies are ineffective with sub-optimally saturated initialization, which happens when the density concentrates on a sub-optimal action. Sub-optimal policy saturation may arise from bad policy initialization or sudden changes in the environment that occur after the policy has already converged, and softmax PG estimators require a large number of updates to recover an effective policy. This severe issue causes high sample inefficiency and poor adaptability to new situations. To mitigate this problem, we propose a novel policy gradient estimator for softmax policies that utilizes the bias in the critic estimate and the noise present in the reward signal to escape the saturated regions of the policy parameter space. Our analysis and experiments, conducted on bandits and classical MDP benchmarking tasks, show that our estimator is more robust to policy saturation.

baseline, estimator, regular estimator, (12 more...)

arXiv.org Artificial Intelligence

2112.11622

Country:

North America > Canada > Alberta (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Hierarchical model-based policy optimization: from actions to action sequences and back

McNamee, Daniel

arXiv.org Artificial IntelligenceNov-28-2019

We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a manner which is sensitive to the long-range correlational structure of the induced stationary state-action densities. We demonstrate that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations. In simulation, we show that the priorization of local policy updates in the resulting policy flow indeed reflects the intuitive state-space hierarchy in several toy problems.

action preference, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1912.01448

Country:

Asia > Vietnam > Hanoi > Hanoi (0.05)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Ensuring Ethical Behavior from Autonomous Systems

Anderson, Michael (University of Hartford) | Anderson, Susan (University of Connecticut) | Berenz, Vincent (Max Planck Institute)

AAAI ConferencesApr-12-2016

We advocate a case-supported principle-based behavior paradigm coupled with the Fractal robot architecture as a means to control an eldercare robot. The most ethically preferable action at any given moment is determined using a principle, abstracted from cases where a consensus of ethicists exists.

anderson, artificial intelligence, ensuring ethical behavior, (16 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Connecticut (0.05)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.53)

Add feedback